Data: Import, clean and pre-process the data

In [1]:
In [6]:
In [7]:
Out[7]:
compactness circularity distance_circularity radius_ratio pr.axis_aspect_ratio max.length_aspect_ratio scatter_ratio elongatedness pr.axis_rectangularity max.length_rectangularity scaled_variance scaled_variance.1 scaled_radius_of_gyration scaled_radius_of_gyration.1 skewness_about skewness_about.1 skewness_about.2 hollows_ratio class
0 95 48.0 83.0 178.0 72.0 10 162.0 42.0 20.0 159 176.0 379.0 184.0 70.0 6.0 16.0 187.0 197 van
1 91 41.0 84.0 141.0 57.0 9 149.0 45.0 19.0 143 170.0 330.0 158.0 72.0 9.0 14.0 189.0 199 van
2 104 50.0 106.0 209.0 66.0 10 207.0 32.0 23.0 158 223.0 635.0 220.0 73.0 14.0 9.0 188.0 196 car
3 93 41.0 82.0 159.0 63.0 9 144.0 46.0 19.0 143 160.0 309.0 127.0 63.0 6.0 10.0 199.0 207 van
4 85 44.0 70.0 205.0 103.0 52 149.0 45.0 19.0 144 241.0 325.0 188.0 127.0 9.0 11.0 180.0 183 bus
In [8]:
In [9]:
The dataset contains 846 rows and 19 columns.
In [10]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 846 entries, 0 to 845
Data columns (total 19 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   compactness                  846 non-null    int64  
 1   circularity                  841 non-null    float64
 2   distance_circularity         842 non-null    float64
 3   radius_ratio                 840 non-null    float64
 4   pr.axis_aspect_ratio         844 non-null    float64
 5   max.length_aspect_ratio      846 non-null    int64  
 6   scatter_ratio                845 non-null    float64
 7   elongatedness                845 non-null    float64
 8   pr.axis_rectangularity       843 non-null    float64
 9   max.length_rectangularity    846 non-null    int64  
 10  scaled_variance              843 non-null    float64
 11  scaled_variance.1            844 non-null    float64
 12  scaled_radius_of_gyration    844 non-null    float64
 13  scaled_radius_of_gyration.1  842 non-null    float64
 14  skewness_about               840 non-null    float64
 15  skewness_about.1             845 non-null    float64
 16  skewness_about.2             845 non-null    float64
 17  hollows_ratio                846 non-null    int64  
 18  class                        846 non-null    object 
dtypes: float64(14), int64(4), object(1)
memory usage: 125.7+ KB
In [11]:
Duplicated rows:  0
In [12]:
Out[12]:
compactness                    0
circularity                    5
distance_circularity           4
radius_ratio                   6
pr.axis_aspect_ratio           2
max.length_aspect_ratio        0
scatter_ratio                  1
elongatedness                  1
pr.axis_rectangularity         3
max.length_rectangularity      0
scaled_variance                3
scaled_variance.1              2
scaled_radius_of_gyration      2
scaled_radius_of_gyration.1    4
skewness_about                 6
skewness_about.1               1
skewness_about.2               1
hollows_ratio                  0
class                          0
dtype: int64
In [13]:
In [14]:
Out[14]:
count mean std min 25% 50% 75% max
compactness 846.0 93.678487 8.234474 73.0 87.00 93.0 100.00 119.0
circularity 846.0 44.823877 6.134272 33.0 40.00 44.0 49.00 59.0
distance_circularity 846.0 82.100473 15.741569 40.0 70.00 80.0 98.00 112.0
radius_ratio 846.0 168.874704 33.401356 104.0 141.00 167.0 195.00 333.0
pr.axis_aspect_ratio 846.0 61.677305 7.882188 47.0 57.00 61.0 65.00 138.0
max.length_aspect_ratio 846.0 8.567376 4.601217 2.0 7.00 8.0 10.00 55.0
scatter_ratio 846.0 168.887707 33.197710 112.0 147.00 157.0 198.00 265.0
elongatedness 846.0 40.936170 7.811882 26.0 33.00 43.0 46.00 61.0
pr.axis_rectangularity 846.0 20.580378 2.588558 17.0 19.00 20.0 23.00 29.0
max.length_rectangularity 846.0 147.998818 14.515652 118.0 137.00 146.0 159.00 188.0
scaled_variance 846.0 188.596927 31.360427 130.0 167.00 179.0 217.00 320.0
scaled_variance.1 846.0 439.314421 176.496341 184.0 318.25 363.5 586.75 1018.0
scaled_radius_of_gyration 846.0 174.706856 32.546277 109.0 149.00 173.5 198.00 268.0
scaled_radius_of_gyration.1 846.0 72.443262 7.468734 59.0 67.00 71.5 75.00 135.0
skewness_about 846.0 6.361702 4.903244 0.0 2.00 6.0 9.00 22.0
skewness_about.1 846.0 12.600473 8.930962 0.0 5.00 11.0 19.00 41.0
skewness_about.2 846.0 188.918440 6.152247 176.0 184.00 188.0 193.00 206.0
hollows_ratio 846.0 195.632388 7.438797 181.0 190.25 197.0 201.00 211.0

EDA and visualisation:

In [15]:
In [16]:
car    429
bus    218
van    199
Name: class, dtype: int64
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x24d2c19a948>

Vehicle counts in the 'class':

# 429 cars
# 218 bus
# 199 vans

Let us plot the 'Boxplots' to understand about the outliers

In [18]:
In [19]:
Out[19]:
Text(0.5, 1.0, 'Boxplot of Skewness About Column')
In [20]:
Out[20]:
Text(0.5, 1.0, 'Boxplot of Skewness About 1 Column')

Find the outliers and replace them by median

In [21]:
In [22]:

Boxplot of all columns after outlier treatment

In [23]:
Out[23]:
<seaborn.axisgrid.PairGrid at 0x24d2ab8af48>
In [24]:
Out[24]:
compactness circularity distance_circularity radius_ratio pr.axis_aspect_ratio max.length_aspect_ratio scatter_ratio elongatedness pr.axis_rectangularity max.length_rectangularity scaled_variance scaled_variance.1 scaled_radius_of_gyration scaled_radius_of_gyration.1 skewness_about skewness_about.1 skewness_about.2 hollows_ratio
compactness 1.000000 0.684887 0.789928 0.721925 0.192864 0.499928 0.812620 -0.788750 0.813694 0.676143 0.769871 0.806170 0.585243 -0.246681 0.197308 0.156348 0.298537 0.365552
circularity 0.684887 1.000000 0.792320 0.638280 0.203253 0.560470 0.847938 -0.821472 0.843400 0.961318 0.802768 0.827462 0.925816 0.068745 0.136351 -0.009666 -0.104426 0.046351
distance_circularity 0.789928 0.792320 1.000000 0.794222 0.244332 0.666809 0.905076 -0.911307 0.893025 0.774527 0.869584 0.883943 0.705771 -0.229353 0.099107 0.262345 0.146098 0.332732
radius_ratio 0.721925 0.638280 0.794222 1.000000 0.650554 0.463958 0.769941 -0.825392 0.744139 0.579468 0.786183 0.760257 0.550774 -0.390459 0.035755 0.179601 0.405849 0.491758
pr.axis_aspect_ratio 0.192864 0.203253 0.244332 0.650554 1.000000 0.150295 0.194195 -0.298144 0.163047 0.147592 0.207101 0.196401 0.148591 -0.321070 -0.056030 -0.021088 0.400882 0.415734
max.length_aspect_ratio 0.499928 0.560470 0.666809 0.463958 0.150295 1.000000 0.490759 -0.504181 0.487931 0.642713 0.401391 0.463249 0.397397 -0.335444 0.081898 0.141664 0.083794 0.413174
scatter_ratio 0.812620 0.847938 0.905076 0.769941 0.194195 0.490759 1.000000 -0.971601 0.989751 0.809083 0.960883 0.980447 0.799875 0.011314 0.064242 0.211647 0.005628 0.118817
elongatedness -0.788750 -0.821472 -0.911307 -0.825392 -0.298144 -0.504181 -0.971601 1.000000 -0.948996 -0.775854 -0.947644 -0.948851 -0.766314 0.078391 -0.046943 -0.183642 -0.115126 -0.216905
pr.axis_rectangularity 0.813694 0.843400 0.893025 0.744139 0.163047 0.487931 0.989751 -0.948996 1.000000 0.810934 0.947329 0.973606 0.796690 0.027545 0.073127 0.213801 -0.018649 0.099286
max.length_rectangularity 0.676143 0.961318 0.774527 0.579468 0.147592 0.642713 0.809083 -0.775854 0.810934 1.000000 0.750222 0.789632 0.866450 0.053856 0.130702 0.004129 -0.103948 0.076770
scaled_variance 0.769871 0.802768 0.869584 0.786183 0.207101 0.401391 0.960883 -0.947644 0.947329 0.750222 1.000000 0.943780 0.785073 0.025828 0.024693 0.197122 0.015171 0.086330
scaled_variance.1 0.806170 0.827462 0.883943 0.760257 0.196401 0.463249 0.980447 -0.948851 0.973606 0.789632 0.943780 1.000000 0.782972 0.009386 0.065731 0.204941 0.017557 0.119642
scaled_radius_of_gyration 0.585243 0.925816 0.705771 0.550774 0.148591 0.397397 0.799875 -0.766314 0.796690 0.866450 0.785073 0.782972 1.000000 0.215279 0.162970 -0.055667 -0.224450 -0.118002
scaled_radius_of_gyration.1 -0.246681 0.068745 -0.229353 -0.390459 -0.321070 -0.335444 0.011314 0.078391 0.027545 0.053856 0.025828 0.009386 0.215279 1.000000 -0.057755 -0.123996 -0.832738 -0.901332
skewness_about 0.197308 0.136351 0.099107 0.035755 -0.056030 0.081898 0.064242 -0.046943 0.073127 0.130702 0.024693 0.065731 0.162970 -0.057755 1.000000 -0.041734 0.086661 0.062619
skewness_about.1 0.156348 -0.009666 0.262345 0.179601 -0.021088 0.141664 0.211647 -0.183642 0.213801 0.004129 0.197122 0.204941 -0.055667 -0.123996 -0.041734 1.000000 0.074473 0.200651
skewness_about.2 0.298537 -0.104426 0.146098 0.405849 0.400882 0.083794 0.005628 -0.115126 -0.018649 -0.103948 0.015171 0.017557 -0.224450 -0.832738 0.086661 0.074473 1.000000 0.892581
hollows_ratio 0.365552 0.046351 0.332732 0.491758 0.415734 0.413174 0.118817 -0.216905 0.099286 0.076770 0.086330 0.119642 -0.118002 -0.901332 0.062619 0.200651 0.892581 1.000000

df.corr() computes pairwise correlation of columns.

Correlation shows how two variables are related to each other.
Positive values shows as one variable increases other variable increases as well.
Negative values shows as one variable increases other variable decreases.
Bigger values show high correlation between variables while smaller values show less correlation.
In [28]:
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x24d3e1b75c8>

Splitting the dependent and independent variables. The dependent variable (y) is further transformed into an encoded categorical column.

In [26]:
In [27]:
Out[27]:
Text(0.5, 1.0, 'Correlation with Class column')

Using Robust Scaler to standardize the values of each column. This is required in order to bring the input variables on same scale,which might be on different scales in the raw form.

In [30]:
Out[30]:
compactness circularity distance_circularity radius_ratio pr.axis_aspect_ratio max.length_aspect_ratio scatter_ratio elongatedness pr.axis_rectangularity max.length_rectangularity scaled_variance scaled_variance.1 scaled_radius_of_gyration scaled_radius_of_gyration.1 skewness_about skewness_about.1 skewness_about.2 hollows_ratio
0 0.153846 0.444444 0.107143 0.204651 1.375 0.666667 0.098039 -0.076923 0.00 0.590909 -0.060302 0.057890 0.214286 -0.1875 0.000000 0.357143 -0.111111 0.000000
1 -0.153846 -0.333333 0.142857 -0.483721 -0.500 0.333333 -0.156863 0.153846 -0.25 -0.136364 -0.180905 -0.125117 -0.316327 0.0625 0.428571 0.214286 0.111111 0.186047
2 0.846154 0.666667 0.928571 0.781395 0.625 0.666667 0.980392 -0.846154 0.75 0.545455 0.884422 1.014006 0.948980 0.1875 1.142857 -0.142857 0.000000 -0.093023
3 0.000000 -0.333333 0.071429 -0.148837 0.250 0.333333 -0.254902 0.230769 -0.25 -0.136364 -0.381910 -0.203548 -0.948980 -1.0625 0.000000 -0.071429 1.222222 0.930233
4 -0.615385 0.000000 -0.357143 0.706977 0.000 0.000000 -0.156863 0.153846 -0.25 -0.090909 1.246231 -0.143791 0.295918 0.0000 0.428571 0.000000 -0.888889 -1.302326

Dimensional reduction:

In [31]:
Out[31]:
Text(0, 0.5, 'Percentage of Cumulative Explained Variance')
In [32]:
Out[32]:
Text(0.5, 1.0, 'Vehicle Dataset Explained Variance')

Findings after applying PCA on the dataset

We can see that the first seven components explain more than 95% of variation. Between first five components, more than 91% of the information is captured. The above plot shows almost 95% variance by the first 7 components. Therefore we can drop 8th component onwards.

Eigen Values

In [33]:
Eigen Values: 

Out[33]:
array([3.96100935e+00, 1.65870806e+00, 5.13125294e-01, 4.94835514e-01,
       3.94116009e-01, 2.96614528e-01, 1.39542157e-01, 9.65854893e-02,
       6.14513508e-02, 3.54768843e-02, 2.88836973e-02, 2.31031113e-02,
       1.73160148e-02, 1.31973434e-02, 1.21370214e-02, 9.51277726e-03,
       8.47124019e-03, 2.15456314e-03])
In [34]:
Eigen Vectors: 

Out[34]:
array([[ 2.70447593e-01,  3.08703426e-01,  2.65942302e-01,
         2.52585109e-01,  1.09214594e-01,  2.13874526e-01,
         3.15596351e-01, -2.89988322e-01,  3.10540787e-01,
         2.89929102e-01,  2.91291670e-01,  3.10633904e-01,
         2.75022976e-01, -5.33629155e-02,  3.81838626e-02,
         5.63034801e-02,  4.22027355e-02,  9.45420647e-02],
       [-7.47916800e-02,  1.26651078e-01, -3.48093808e-02,
        -1.62272484e-01, -2.48978101e-01, -1.08750428e-01,
         7.36512380e-02, -1.53458109e-02,  8.42168152e-02,
         1.13034088e-01,  7.29964089e-02,  7.02249862e-02,
         1.97981239e-01,  5.56982050e-01, -1.57637616e-02,
        -7.55602532e-02, -4.86108063e-01, -4.99531014e-01],
       [-1.18830485e-01, -5.87844882e-02, -6.00864271e-02,
         2.49637579e-01,  6.08795859e-01, -3.65061020e-01,
         4.64038791e-02, -9.41521128e-02,  2.21250572e-02,
        -1.41319498e-01,  1.17492762e-01,  5.55858661e-02,
        -7.03700592e-03,  1.66474566e-01, -5.67582188e-01,
        -5.73388419e-02,  2.12686548e-02, -1.02793570e-01],
       [ 4.27060563e-02, -2.04641952e-01,  9.83827980e-02,
        -4.18919783e-02, -3.57611140e-01, -1.78522101e-02,
         1.16625635e-01, -6.62459216e-02,  1.23542803e-01,
        -1.75173901e-01,  1.17106065e-01,  1.15358975e-01,
        -2.26070778e-01, -1.47603206e-03, -3.61524924e-01,
         7.35663856e-01, -4.78911374e-02,  1.98419487e-02],
       [ 1.67711300e-01, -1.33205168e-01, -4.67223517e-02,
         9.15218435e-02,  2.84034315e-02, -6.14983560e-01,
         9.22603989e-02, -7.21366890e-02,  9.12406664e-02,
        -2.27643275e-01,  1.36375796e-01,  1.15921561e-01,
        -7.23126238e-03,  1.14889835e-01,  6.09873173e-01,
         1.46764585e-01,  2.15089071e-01, -7.52327666e-02],
       [ 2.09361193e-01, -2.48759558e-02, -2.48803331e-02,
        -1.01633246e-01, -5.29111595e-01, -3.09599292e-01,
         8.68104798e-02, -6.92136470e-02,  9.04204493e-02,
        -4.02947717e-02,  1.23278276e-01,  1.03231296e-01,
        -1.29672770e-02, -1.42655065e-01, -3.69345127e-01,
        -5.36046078e-01,  2.73011291e-01,  4.46606823e-02],
       [ 2.56047244e-01, -3.97277488e-01,  1.26260693e-01,
         1.48018259e-01,  8.93832328e-02,  3.89618713e-01,
         1.08567589e-01, -1.07616048e-01,  1.10572144e-01,
        -3.40242267e-01,  7.70331930e-02,  1.11253556e-01,
        -4.57308852e-01,  1.02963750e-01,  1.16626848e-01,
        -3.29492051e-01, -2.08056920e-01, -1.55271051e-01],
       [-6.72474213e-01, -7.29589148e-02,  2.31440093e-01,
         1.56514129e-01, -1.35692535e-01, -5.22763904e-02,
         7.38014685e-02, -1.56224160e-01,  3.27133760e-02,
        -2.47536559e-01,  2.00219027e-01,  5.14073055e-02,
         1.65304825e-01, -4.43869020e-01,  1.11614812e-01,
        -1.07424651e-01, -2.20835307e-01, -1.05876724e-01],
       [ 5.04749996e-01,  7.54254846e-02, -1.91860933e-02,
         1.67066656e-01,  4.57334114e-02, -1.51396449e-01,
        -1.10022247e-01,  2.73683172e-01, -5.27664454e-02,
        -7.80907426e-02, -3.34964288e-02, -1.11072399e-01,
         1.64193565e-01, -5.29810081e-01, -6.70141878e-02,
         7.73235518e-02, -3.89770731e-01, -3.26876744e-01],
       [-1.70161320e-01,  1.91667239e-01, -1.30900439e-01,
        -7.93328887e-02,  8.02218624e-02, -2.11369611e-01,
         1.40125209e-01,  2.99693926e-02,  2.19657398e-01,
         4.31778188e-01, -2.04159107e-01,  2.65820027e-01,
        -6.23186733e-01, -2.57227200e-01,  5.92845950e-02,
        -2.13143586e-02, -1.37845735e-01, -1.21298442e-01],
       [-5.91961763e-02,  1.20828194e-01, -1.12277136e-01,
         5.33649977e-01, -2.06548564e-01,  5.31959710e-02,
        -1.51409202e-01,  2.56654983e-02, -2.36963253e-01,
         2.65824481e-01,  4.64339984e-01, -3.35431299e-01,
        -2.93177990e-01,  8.54547293e-02,  2.95111154e-02,
         3.18033273e-02,  1.60083604e-01, -1.99317195e-01],
       [-8.00226412e-02, -8.73886530e-02, -8.46394803e-01,
         1.96819029e-01, -7.43089131e-02,  2.18572783e-01,
         9.06203662e-02,  8.08557468e-02,  1.55313763e-01,
        -1.10430781e-01,  4.77221683e-02,  2.83859784e-01,
         1.91772086e-01, -6.25623542e-02,  1.52476840e-02,
         2.68965786e-02, -8.81648689e-03,  4.35522178e-02],
       [-1.61951284e-02, -1.30389125e-04,  1.20718391e-01,
         3.98296806e-01, -1.49226840e-01, -2.04382687e-01,
        -5.25668791e-02,  3.34809933e-01,  8.54563860e-02,
         1.84461015e-02, -7.21473417e-02,  6.16804636e-02,
        -3.71288366e-02,  2.12539991e-01,  9.22179622e-03,
        -9.07299498e-02, -4.23450305e-01,  6.29913350e-01],
       [ 8.90092284e-02,  2.34788364e-01, -2.54759008e-01,
        -5.20363445e-02, -1.52077013e-02, -8.15077575e-02,
         1.98060805e-01, -5.61694740e-01,  1.21970560e-01,
        -1.55839047e-01, -7.90288078e-02, -5.70530777e-01,
        -9.86359854e-02, -3.75206896e-02,  1.68096200e-02,
        -1.54878715e-03, -2.70663190e-01,  2.28350752e-01],
       [ 1.81174456e-02, -4.24160944e-01, -8.59924473e-02,
        -4.03513503e-01,  1.79352792e-01, -5.39192664e-02,
         1.13714507e-02,  1.42928644e-01,  1.68509472e-01,
         3.31984132e-01,  5.75788525e-01, -2.05288637e-01,
         4.70647724e-02, -1.01917682e-01,  3.08510103e-02,
        -2.95690163e-02, -1.78756903e-01,  1.87727412e-01],
       [-9.79830294e-02, -2.17721077e-01,  9.36838286e-02,
         2.15617724e-01, -4.53750699e-02,  3.91866570e-02,
         1.38403469e-01,  2.36761364e-01,  6.48684390e-01,
         1.01900075e-01, -2.98374673e-01, -4.19783488e-01,
         1.23431648e-01,  3.64023472e-02, -2.26568769e-02,
        -5.95437277e-04,  2.35956692e-01, -1.99381053e-01],
       [ 5.86160578e-02, -5.59237233e-01, -1.66532048e-02,
         2.28644993e-01, -8.29931431e-02, -9.57677596e-02,
        -2.95772789e-02, -4.31556168e-01, -2.68831115e-01,
         4.48429546e-01, -3.28751730e-01,  6.97811832e-02,
         1.74573742e-01, -4.07231770e-02, -8.99809891e-03,
         1.63625971e-02, -9.39713962e-02, -2.16128031e-02],
       [-1.98478990e-02, -4.43227571e-02,  6.93086255e-03,
         4.13402027e-03,  8.42422687e-03,  4.32506557e-03,
         8.49064303e-01,  2.88298024e-01, -4.19497612e-01,
         1.79622778e-02, -4.12832261e-02, -1.20142470e-01,
         1.56716751e-02,  1.77961877e-03, -4.84621779e-04,
        -9.81617880e-03,  2.57192135e-02, -3.81653921e-03]])

The percentage of variation explained by each Eigen Vector

In [35]:
The percentage of variation explained by each Eigen Vector: 

Out[35]:
array([5.10029196e-01, 2.13579284e-01, 6.60712607e-02, 6.37162241e-02,
       5.07473357e-02, 3.81928080e-02, 1.79677874e-02, 1.24365825e-02,
       7.91262536e-03, 4.56808990e-03, 3.71913510e-03, 2.97481279e-03,
       2.22965218e-03, 1.69932203e-03, 1.56279239e-03, 1.22488833e-03,
       1.09077749e-03, 2.77426790e-04])
In [36]:
Out[36]:
Text(0.5, 0, 'Principal Components')
In [45]:
Out[45]:
Text(0.5, 0, 'Principal Components')
In [46]:
Out[46]:
Text(0.5, 0, 'Principal Components')

In the above graph, the blue line represents component-wise explained variance while the orange line represents the cumulative explained variance.

Applying PCA on 7 Components

In [47]:
Original number of features: 18
Reduced number of features: 7
In [48]:
Out[48]:
0 1 2 3 4 5 6
0 0.446376 -0.475441 0.415578 -0.591180 -0.628784 -1.166228 -0.141708
1 -0.969205 -0.247348 -0.860688 0.132533 -0.118024 -0.153620 0.121625
2 2.471311 0.202940 -0.292011 -0.761633 0.598456 -0.485311 0.517424
3 -0.945422 -2.161404 -0.350315 -0.148290 -0.333014 0.219506 0.096389
4 -0.440204 1.005528 0.217450 -0.347911 0.147644 -0.579934 0.214694

Pairplot of PCA Dataset

In [49]:
Out[49]:
<seaborn.axisgrid.PairGrid at 0x24d4469fc88>

Splitting the data into training (70%) and testing set (30%).

In [57]:
The training set comprises of 592 rows and 18 columns.
In [58]:
The test set comprises of 254 rows and 18 columns.

PCA reduced Dataset

In [59]:
The PCA training set comprises of 592 rows and 7 columns.
In [60]:
The PCA test set comprises of 254 rows and 7 columns.

Classifier:

SVC Model with PCA

In [61]:
SVC Model of dataset with PCA

Accuracy Score of Training Data: 0.839527027027027

Accuracy Score of Test Data: 0.8543307086614174

Classification Report of SVC Model:
                precision    recall  f1-score   support

           0       0.81      0.86      0.84        71
           1       0.92      0.83      0.87       125
           2       0.79      0.90      0.84        58

    accuracy                           0.85       254
   macro avg       0.84      0.86      0.85       254
weighted avg       0.86      0.85      0.86       254


Mean Absolute Error of SVC:
 0.1732283464566929

Confusion Matrix of SVC:
 [[ 61   7   3]
 [ 10 104  11]
 [  4   2  52]]
Out[61]:
Text(0.5, 1, 'Confusion Matrix HeatMap of SVCwith PCA Model')
In [62]:
Precision Score : 0.84
Recall Score : 0.86
F1-Score : 0.85
Accuracy Score : 0.85

SVC Model without PCA

In [63]:
SVC Model of dataset without PCA

Accuracy Score of Training Data: 0.9425675675675675

Accuracy Score of Test Data: 0.9448818897637795

Classification Report of SVC Model:
                precision    recall  f1-score   support

           0       0.94      0.94      0.94        71
           1       0.97      0.93      0.95       125
           2       0.90      0.98      0.94        58

    accuracy                           0.94       254
   macro avg       0.94      0.95      0.94       254
weighted avg       0.95      0.94      0.94       254


Mean Absolute Error of SVC:
 0.05905511811023622

Confusion Matrix of SVC:
 [[ 67   3   1]
 [  4 116   5]
 [  0   1  57]]
Out[63]:
Text(0.5, 1, 'Confusion Matrix HeatMap of SVCModel')
In [64]:
Precision Score : 0.94
Recall Score : 0.95
F1-Score : 0.94
Accuracy Score : 0.94

Dataframe showing results of models with and without PCA

In [65]:
Out[65]:
Model Accuracy Score of Training Data Accuracy Score of Test Data RecallScore Precision Score
0 Support Vector Classifier with PCA 83.952703 85.433071 86.256888 84.052203
1 Support Vector Classifier without PCA 94.256757 94.488189 95.147353 93.836351

We can see that the accuracy score of Test Data has reduced with PCA by about 13.78%.

Grid Search with PCA dataset

In [66]:
Fitting 5 folds for each of 8 candidates, totalling 40 fits


Best Parameters:
 {'C': 1, 'kernel': 'rbf'}


Best Estimators:
 SVC(C=1)

According to Grid Search, the RBF model of SVC with C (regularization parameter) as 1

In [67]:
Accuracy Score of Training Data: 0.9375

Accuracy Score of Test Data: 0.9133858267716536

Classification Report of SVC Model:
                precision    recall  f1-score   support

           0       0.97      0.96      0.96        71
           1       0.93      0.90      0.92       125
           2       0.81      0.88      0.84        58

    accuracy                           0.91       254
   macro avg       0.90      0.91      0.91       254
weighted avg       0.92      0.91      0.91       254


Mean Absolute Error of SVC:
 0.09448818897637795

Confusion Matrix of SVC:
 [[ 68   2   1]
 [  1 113  11]
 [  1   6  51]]
Out[67]:
Text(0.5, 1, 'Confusion Matrix HeatMap of SVCwith PCA Model')
In [68]:
Precision Score : 0.90
Recall Score : 0.91
F1-Score : 0.91
Accuracy Score : 0.91

Grid Search with original dataset

In [69]:
Fitting 5 folds for each of 8 candidates, totalling 40 fits


Best Parameters:
 {'C': 1, 'kernel': 'rbf'}


Best Estimators:
 SVC(C=1)

According to Grid Search, the RBF model of SVC with C (regularization parameter) as 1

In [70]:
Accuracy Score of Training Data:  0.9662162162162162

Accuracy Score of Test Data: 0.9606299212598425

Classification Report of SVC Linear Model:
                precision    recall  f1-score   support

           0       0.99      0.99      0.99        71
           1       0.98      0.96      0.97       125
           2       0.90      0.93      0.92        58

    accuracy                           0.96       254
   macro avg       0.95      0.96      0.96       254
weighted avg       0.96      0.96      0.96       254


Mean Absolute Error of SVC Linear:
 0.047244094488188976

Confusion Matrix of SVC Linear:
 [[ 70   0   1]
 [  0 120   5]
 [  1   3  54]]
Out[70]:
Text(0.5, 1, 'Confusion Matrix HeatMap of SVCGrid Search Model')
In [71]:
Precision Score : 0.95
Recall Score : 0.96
F1-Score : 0.96
Accuracy Score : 0.96
In [72]:
Out[72]:
Model Accuracy Score of Training Data Accuracy Score of Test Data Recall Score Precision Score
0 Support Vector Classifier with PCA using Grid ... 93.750000 91.338583 91.368561 90.494556
1 Support Vector Classifier using Grid Search 96.621622 96.062992 95.898333 95.384175
In [73]:
Out[73]:
Model Accuracy Score of Training Data Accuracy Score of Test Data Recall Score Precision Score
0 Support Vector Classifier with PCA 83.952703 85.433071 86.256888 84.052203
1 Support Vector Classifier with PCA using Grid ... 93.750000 91.338583 91.368561 90.494556
2 Support Vector Classifier using Grid Search 96.621622 96.062992 95.898333 95.384175
3 Support Vector Classifier without PCA 94.256757 94.488189 95.147353 93.836351
In [74]:
Out[74]:
Text(0.5, 1.0, 'Comparison of Classification Models')

Conclusion:

Dimensional reduction using PCA is helped in this case study.
It is understood that from the above score summary from the SVM classifier, original model with grid search is giving    better accuracy. 
But the Model with PCA technique and Grid search performs equally with original model with grid search
Using PCA technique, the variables are reduced from 19 to 7 without compromise the lose of information from the data.
Hence Model with dimensional reduction using PCA took less computational time.
Not only reducing the computational time, model performance also equally better with original model.
Hence PCA plays vital role in this case study.
In [ ]:
In [ ]: